AITopics | structural similarity

Collaborating Authors

structural similarity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0be40478ab6ee0006ee3b38b158bbc8f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 18:33:55 GMT

cell type, representation, sccello, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Portugal > Braga > Braga (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

SynTwins: A Retrosynthesis-Guided Framework for Synthesizable Molecular Analog Generation

Chen, Shuan, Nam, Gunwook, Aspuru-Guzik, Alan, Jung, Yousung

arXiv.org Artificial IntelligenceNov-25-2025

The disconnect between AI-generated molecules with desirable properties and their synthetic feasibility remains a critical bottleneck in computational discovery of drugs and materials. While generative AI has accelerated the proposal of candidate molecules, many of these structures prove challenging or impossible to synthesize using established chemical reactions. Here, we introduce SynTwins, a novel retrosynthesis-guided molecule design framework that finds synthetically accessible molecular analogs by emulating expert chemists' strategies in three steps: retrosynthesis, searching similar building blocks, and virtual synthesis. Using a search algorithm instead of a stochastic data-driven generator, SynTwins outperforms state-of-the-art machine learning models at exploring synthetically accessible analogs while maintaining high structural similarity to original target molecules. Furthermore, when integrated into existing molecular property-optimization frameworks, our hybrid approach produces synthetically feasible analogs with minimal loss in property scores. Our comprehensive benchmarking across diverse molecular datasets demonstrates that SynTwins effectively bridges the gap between computational design and experimental synthesis, providing a practical solution for accelerating the discovery of synthesizable molecules with desired properties for a wide range of applications.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.02752

Country:

North America > United States (0.74)
North America > Canada > Ontario (0.28)

Genre: Research Report (0.85)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Cell ontology guided transcriptome foundation model

Neural Information Processing SystemsOct-9-2025, 18:20:04 GMT

Cells are basic units of all living organisms.

cell type, representation, sccello, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Portugal > Braga > Braga (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

all of the components ", has an interesting "idea of stabilizing training ", and "achieves state-of-the-art performance. "

Neural Information Processing SystemsOct-3-2025, 03:36:16 GMT

We thank the reviewers for their time and valuable feedback. Below, we clarify several important points raised by the reviewers. An extra page in the final version will allow us to include the requested details. We believe these clarifications, together with new analyses, resolve all key issues raised. Rep'16] and provides a highly constraining measure of local topology.

artificial intelligence, gnng uard, graph, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.31)

Technology: Information Technology > Artificial Intelligence (0.32)

Add feedback

POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization

Wang, Ziqing, Wen, Yibo, Pattie, William, Luo, Xiao, Wu, Weimin, Hu, Jerry Yao-Chieh, Pandey, Abhishek, Liu, Han, Ding, Kaize

arXiv.org Artificial IntelligenceSep-29-2025

Lead optimization in drug discovery requires efficiently navigating vast chemical space through iterative cycles to enhance molecular properties while preserving structural similarity to the original lead compound. Despite recent advances, traditional optimization methods struggle with sample efficiency-achieving good optimization performance with limited oracle evaluations. Large Language Models (LLMs) provide a promising approach through their in-context learning and instruction following capabilities, which align naturally with these iterative processes. However, existing LLM-based methods fail to leverage this strength, treating each optimization step independently. To address this, we present POLO (Preference-guided multi-turn Optimization for Lead Optimization), which enables LLMs to learn from complete optimization trajectories rather than isolated steps. At its core, POLO introduces Preference-Guided Policy Optimization (PGPO), a novel reinforcement learning algorithm that extracts learning signals at two complementary levels: trajectory-level optimization reinforces successful strategies, while turn-level preference learning provides dense comparative feedback by ranking intermediate molecules within each trajectory. Through this dual-level learning from intermediate evaluation, POLO achieves superior sample efficiency by fully exploiting each costly oracle call. Extensive experiments demonstrate that POLO achieves 84% average success rate on single-property tasks (2.3x better than baselines) and 50% on multi-property tasks using only 500 oracle evaluations, significantly advancing the state-of-the-art in sample-efficient molecular optimization.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2509.21737

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Data-driven Methods of Extracting Text Structure and Information Transfer

Honna, Shinichi, Murayama, Taichi, Matsui, Akira

arXiv.org Artificial IntelligenceSep-17-2025

The Anna Karenina Principle (AKP) holds that success requires satisfying a small set of essential conditions, whereas failure takes diverse forms. We test AKP, its reverse, and two further patterns described as ordered and noisy across novels, online encyclopedias, research papers, and movies. Texts are represented as sequences of functional blocks, and convergence is assessed in transition order and position. Results show that structural principles vary by medium: novels follow reverse AKP in order, Wikipedia combines AKP with ordered patterns, academic papers display reverse AKP in order but remain noisy in position, and movies diverge by genre. Success therefore depends on structural constraints that are specific to each medium, while failure assumes different shapes across domains.

machine learning, natural language, transition, (18 more...)

arXiv.org Artificial Intelligence

2509.12999

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)

Add feedback

Understanding Subword Compositionality of Large Language Models

Peng, Qiwei, Chai, Yekun, Søgaard, Anders

arXiv.org Artificial IntelligenceAug-26-2025

Large language models (LLMs) take sequences of subwords as input, requiring them to effective compose subword representations into meaningful word-level representations. In this paper, we present a comprehensive set of experiments to probe how LLMs compose subword information, focusing on three key aspects: structural similarity, semantic decomposability, and form retention. Our analysis of the experiments suggests that these five LLM families can be classified into three distinct groups, likely reflecting difference in their underlying composition strategies. Specifically, we observe (i) three distinct patterns in the evolution of structural similarity between subword compositions and whole-word representations across layers; (ii) great performance when probing layer by layer their sensitivity to semantic decompositionality; and (iii) three distinct patterns when probing sensitivity to formal features, e.g., character sequence length. These findings provide valuable insights into the compositional dynamics of LLMs and highlight different compositional pattens in how LLMs encode and integrate subword information.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.17953

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

Namuduri, Ramya, Wu, Yating, Zheng, Anshun Asher, Wadhwa, Manya, Durrett, Greg, Li, Junyi Jessy

arXiv.org Artificial IntelligenceAug-12-2025

As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to generate text covering diverse topics, there is an overall sense of repetitiveness across texts that we aim to formalize and quantify via a similarity metric. The familiarity between documents arises from the persistence of underlying discourse structures. However, existing similarity metrics dependent on lexical overlap and syntactic patterns largely capture $\textit{content}$ overlap, thus making them unsuitable for detecting $\textit{structural}$ similarities. We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression. We then use this framework to build $\textbf{QUDsim}$, a similarity metric that can detect discursive parallels between documents. Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs. Furthermore, LLMs are not only repetitive and structurally uniform, but are also divergent from human authors in the types of structures they use.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.09373

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Scale-Invariance Drives Convergence in AI and Brain Representations

Yu, Junjie, Ma, Wenxiao, Zhang, Jianyu, Deng, Haotian, Deng, Zihan, Guo, Yi, Liu, Quanying

arXiv.org Artificial IntelligenceJun-17-2025

Despite variations in architecture and pretraining strategies, recent studies indicate that large-scale AI models often converge toward similar internal representations that also align with neural activity. We propose that scale-invariance, a fundamental structural principle in natural systems, is a key driver of this convergence. In this work, we propose a multi-scale analytical framework to quantify two core aspects of scale-invariance in AI representations: dimensional stability and structural similarity across scales. We further investigate whether these properties can predict alignment performance with functional Magnetic Resonance Imaging (fMRI) responses in the visual cortex. Our analysis reveals that embeddings with more consistent dimension and higher structural similarity across scales align better with fMRI data. Furthermore, we find that the manifold structure of fMRI data is more concentrated, with most features dissipating at smaller scales. Embeddings with similar scale patterns align more closely with fMRI data. We also show that larger pretraining datasets and the inclusion of language modalities enhance the scale-invariance properties of embeddings, further improving neural alignment. Our findings indicate that scale-invariance is a fundamental structural principle that bridges artificial and biological representations, providing a new framework for evaluating the structural quality of human-like AI systems.

alignment, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.12117

Genre: Research Report > New Finding (1.00)

Industry: